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We describe a new HMC algorithm variant we have recently introduced and extend the published 
results by preliminary results of a simulation with a pseudo scalar mass value of mps ~ 300 MeV. 
This new run confirms our expectation that simulations with such pseudo scalar mass values 
become feasible and affordable with our HMC variant. In addition we discuss simulations from 
hot and cold starts at mps ~ 300 MeV, which we performed in order to test for possible meta- 
stabilities. 
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1. Introduction 



Even though Wilson's original discretization of the Dirac operator gives rise to one of the 
clearest and best understood formulations of lattice QCD, it shows problems in practice: due to 
explicit chiral symmetry breaking the Wilson operator develops unphysically small eigenvalues, 
which were thought to be responsible for instabilities observed in dynamical simulations at light 
values of the quark masses with the Hybrid Monte Carlo (HMC) algorithm 

However, recently it was discovered that - rather surprisingly - stable simulations with the 
HMC algorithm are possible with values of the pseudo scalar mass mps as low as 380 MeV ^ [3]], 
if a clever combination of fermion determinant preconditioning and multiple time scale integration 
is used 1 . Moreover, the computational costs appear to be affordable, even with mps ~ 300 MeV, if 
the available results for the computational costs are extrapolated to this value of trips- 
in this proceeding we report on progress with the HMC variant we introduced in ref. [^j. 



2. Mass preconditioning 

For simplicity we consider here Nf = 2 mass degenerate flavors of Wilson fermions with 
Wilson-Dirac operator Dw and the Wilson plaquette gauge action 5 g . The lattice action (for one 
flavor) reads 

S = S g + '£\jf{x){D w + m )\lf(x), (2.1) 

X 

where mq is the bare quark mass. For convenience we also introduce the hopping parameter K = 
(2aniQ + 8) _1 and the hermitian Wilson-Dirac operator Q = JsD w . 

The numerical integration in the molecular dynamics part of the HMC algorithm [jl|] is usually 
performed by means of the leap-frog algorithm, which is reversible and area preserving, properties 
that are needed for the HMC algorithm to be exact. We refer to ref. [||] for details on how the 
leap frog algorithm is generalized to multiple time scales. In that reference we also detail how to 
generalize the so-called Sexton- Weingarten (SW) improved integration scheme [Q]. 

While the HMC variant presented in ref. [Q] is based on domain decomposition precondition- 
ing, our variant relies on the so-called Hasenbusch acceleration or mass preconditioning [^]. It was 
realized in ref. [|]] that using the identity 

detQ 2 = det (Q 2 + PL 2 ) det ( ) (2.2) 

with a mass shift jj. can speed up the HMC algorithm, if each of the two determinants on the r.h.s. of 



eq. ( |2.2D is treated by a separate pseudo fermion field and a corresponding pseudo fermion action 
Spf,. The acceleration comes about for the following reason: the condition number of Q 2 + \i 2 and 
Q 2 /(Q 2 + M 2 ) is reduced when compared to the condition number of Q 2 . A reduced condition 
number is expected to lead to a reduced molecular dynamics force and hence allows for larger step 
sizes in the integration. At the same time the inversion of Q 2 + pi 2 is - due to the mass shift - much 
cheaper than the inversion of Q 2 , altogether leading to a net speed up. 



We expect that determinant preconditioning with the n-th root trick n performs similarly well. 
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The original idea was then to choose the mass shift pi such that the condition numbers of 
Q 2 + }i 2 and Q 2 / (Q 2 + jx 2 ) become approximately equal. The speed-up was observed to be around 
a factor of two [0|. 

3. HMC with multiple time scale integration and mass preconditioning 

Motivated by the successful combination of multiple time scale integration and determinant 
preconditioning via domain decomposition in ref. we explored in ref. [^] the idea of com- 
bining mass preconditioning with multiple time scale integration. With mass preconditioning the 
Hamiltonian for the HMC algorithm reads 

# = 5l^M + I> Fr (3.1) 

L x,n i=0 



The strategy is then to tune pi in eq. ( j2.2[ ) such that the more expensive the computation of a certain 
5Spf, is, the less it contributes to the total force. The different parts of the action can then be 
integrated on different time scales At,- chosen according to their force magnitude F t , guided by 
AZiFi = const for all i. 

In ref. [[| we demonstrated that this idea proves to be useful in practice: we compared the 
performance of our HMC algorithm variant to the variant of ref. ^ and to a plain HMC as used in 
ref. [0]. The simulations were done on 24 3 x 32 lattices with j3 = 5.6 and pseudo scalar masses of 
m PS = 665 MeV, 485 MeV and 380 MeV (runs A, B and C). Details for the algorithm parameters 
as well as results for several quantities such as the plaquette expectation value or the vector mass 
my can be found in ref. [gj. In addition to these published results we have one more simulation 
point, corresponding to mps = 294 MeV []|] (run D). Our simulations at this point are still ongoing 
and the history of this run is not yet long enough to be fully conclusive. Nevertheless, we present 
here first performance results for this point. 

The first important observation from our investigations is that for all four aforementioned 
simulation points the preconditioning masses and time scales can be tuned such that simulations 
are stable. Examples for Monte Carlo histories of the plaquette expectation value or AH can be 
found in ref. [^]. 

In order to compare the performance of our HMC variant to other variants we have chosen two 
different measures. The first is the performance figure v = 10~ 3 (2« + 3)Ti nt (P) as introduced in 
ref. [0]. Tmt(P) is the integrated autocorrelation time of the plaquette and n is the number of inte- 
gration steps for the physical operator Q 2 necessary for one trajectory, v represents the number of 
inversions of the operator Q in thousands needed in order to obtain one independent configuration. 
It is clearly algorithm and machine independent, but it does not account for the preconditioning 
overhead, which is at least for our HMC variant not completely negligible. 

The results for the v-values are summarized in table |l| and, while the v-values for our HMC 
variant and the one of ref. [||, ^] are compatible, they are significantly smaller than the values 
extracted for the plain HMC algorithm used in ref. [Qj. Note that our v- value for simulation point 
D (in red) is only based on an extrapolation of T; nt (P) in 1 /m| s and therefore preliminary. 

The second performance measure we used is the number of floating point operations (flops) 
needed to generate 1000 independent configurations of size 24 3 x 40 with a « 0.08 fm. For this 



118/3 



HMC algorithm with multiple time scale integration and mass preconditioning 



Carsten Urbach 





K 


V [this work] 


v§|] 


v® 


A 


0.15750 


0.09(3) 


0.69(29) 


1.8(8) 


B 


0.15800 


0.11(3) 


0.50(17) 


5.1(5) 


C 


0.15825 


0.23(9) 


0.62(23) 




D 


0.15835 


0.35 


0.74(18) 





Table 1: Comparison of v values from this work, ref. [g] (with updates from [[j]) and ref. [ph. 
measure we could compare our HMC variant to the cost formula of ref. [ |To| ] 

C = k( 11 -^\\\-\ (3.2) 

The actual value of K can be found in JTo|]. The result of the comparison is shown in figure [j] as an 
update of the "Berlin Wall" figures of [0,01. In figure [T(a)| we compare our results represented 



by squares to the results of ref. [||] represented by circles. The lines are functions proportional 
to (mps/mv)~ 4 (dashed) and (mps/«iy)~ 6 (solid) with a coefficient such that they cross the data 
points corresponding to the lightest pseudo scalar mass. The diamond represents the preliminary 
result of simulation point D, where the values for T- m x(P) and m v are extrapolated. 

In figure 1(b) we compare to the formula of eq. ^2| [[R]] (solid line) by extrapolating our 



data with (raps/ray) 4 (dashed) and with (raps/my) 6 (dotted), respectively. The arrow indicates 
the physical pion to rho meson mass ratio. Additionally, we add points from staggered fermion 



simulations as were used for the corresponding plot in ref. QUO- Note that all the cost data were 
scaled to match a lattice time extend of T = 40. 

The most important conclusion from figure [j] is that with our HMC variant the "Wall" is 
substantially shifted towards smaller values of the quark mass and that simulations with Wilson 
fermions and mps < 300 MeV become feasible. Although the result for simulation point D is 
preliminary, it nicely confirms the results for larger values of raps, even under the pessimistic 
assumption that the final value might be a factor of two larger. 

4. Thermalization property or meta-stability? 

Dynamical Wilson fermion simulations show the generic property of a first order phase tran- 



sition at the chiral point, as was shown in ref. [12]. At this phase transition point the PC AC quark 



mass jumps from non- vanishing negative to positive values (or vice versa) and the pseudo scalar 
mass assumes a non-zero minimal value, which can be rather large. This minimal value is supposed 
to vanish as a 2 towards the continuum limit, but a reliable information at which value of a it takes 
a value below, say, 300 MeV, is missing. In ref. [|l3|] this value of a was estimated to be around 
0.1 -0.07 fm. 

Since simulation point D has mps ~ 300 MeV and the value of a lies in the aforementioned in- 
terval, it is important to investigate whether at this simulation point a meta-stability is observed. To 
this end we performed for simulation point D two simulations, one started from an ordered and the 
other from a disordered configuration. Both of these two runs reached now a Monte Carlo history 
of about 1000 trajectories, but it is still not completely clear whether the runs have thermalized. 
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Figure 1: Computer resources needed to generate 1000 independent configurations of size 24 3 x 40 at a 
lattice spacing of about 0.08 fm in units of Tflops • years as a function of mps/my- (See text for details.) 



Nevertheless, when measuring the PCAC quark mass for both runs during the thermalization 
we observe that the run which started from a disordered configuration shows a positive value of 
this quantity while the other run has a negative value, indicating a meta-stability as observed in 
ref. [Oj. Only after around 900 trajectories the results of both runs approach each other and seem 
to converge to a positive value of the quark mass. Hence, it seems that at these parameter values 
no meta-stability occurs and the observed signs are simply thermalization effects. Nevertheless, 
this observation emphasizes the importance of checking for meta-stabilities before large scale sim- 
ulations are started. It might also indicate that simulation point D is close to a first order phase 
transition that possibly occurs at lower values of raps. 

5. Conclusion 

In this proceeding we have reported on our progress with a new variant of the HMC algorithm, 
which we introduced in ref. [Q]. The performance of our variant is comparable to the recently intro- 
duced HMC variant with domain decomposition [Q] and clearly superior to a plain HMC algorithm. 
We presented an update of the "Berlin Wall" figure of refs. fli0| , [TTJ] showing that with our HMC 
variant simulations with raps ~ 300 MeV become affordable and do not suffer from instabilities. 

Moreover, we presented results of a check for meta-stabilities at our simulation point with the 
lowest value of mps. We observed signs for a meta-stability during thermalization, which disappear 
only after around 1000 trajectories. 

For the future it would be interesting to understand why the two HMC algorithm variants - the 
ones discussed here and in ref. |Q| - allow for stable simulations with values of the pseudo scalar 
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mass of about 300 MeV. One speculation is that this is due to the infrared regularization of the 
operator spectrum provided by both, mass and domain decomposition preconditioning. Another 
speculation is that the determinant and the forces are not well enough estimated with only one 
pseudo fermion field, leading to possibly large fluctuations in the forces. These fluctuations can be 
reduced by introducing additional pseudo fermion fields. 

Clearly the clarification of these possibilities would be very interesting and it might provide 
important insight to even further improve the HMC algorithm. 
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